This report includes data validation, quality assessment, and preliminary analysis for a Water Quality Program’s Black Lake Pollution Identification and Correction Project. Code is included for transparency and reproducibility.
If the decision to accept or reject a value is unclear, the option most protective of public health will be chosen.
library(ggplot2)
library(dplyr)
library(tidyverse)
library(lubridate)
#Plotly
devtools::install_github("ropensci/plotly")
library(plotly)
Load Dataframe
library(RCurl)
df <- read.csv("https://raw.githubusercontent.com/AlyssaPeter/R-Code-Sample/main/EIM_Black_Lake_PIC_Project_Data.csv", stringsAsFactors = TRUE, fileEncoding = "ISO-8859-1")
df
Check the dataframe for null values.
null_values <- sapply(df, function(x) sum(is.na(x)))
null_values
## Study_ID Location_ID
## 0 0
## Study_Specific_Location_ID Field_Collection_Type
## 0 0
## Field_Collector Field_Collection_Start_Date
## 0 0
## Field_Collection_Comment Latitude_Decimal_Degrees
## 0 0
## Longitude_Decimal_Degrees Sample_ID
## 0 0
## Sample_Field_Replicate_ID Sample_Replicate_Flag
## 0 0
## Sample_Composite_Flag Sample_Matrix
## 0 0
## Sample_Source Sample_Collection_Method
## 0 0
## Sample_Preparation_Method Result_Parameter_Name
## 281 0
## Result_Parameter_CAS_Number Lab_Analysis_Date
## 0 0
## Result_Value Result_Value_Units
## 0 0
## Result_Reporting_Limit Result_Reporting_Limit_Type
## 0 281
## Result_Detection_Limit Result_Detection_Limit_Type
## 0 0
## Result_Data_Qualifier Fraction_Analyzed
## 0 0
## Result_Method Result_Lab_Name
## 0 0
Check if Latitude and Longitude are the same per each Study_Specific_Location_ID. If the return is “TRUE”, then results are okay.
location_coords_consistency <- df %>%
group_by(Study_Specific_Location_ID) %>%
summarise(
consistent_latitude = n_distinct(Latitude_Decimal_Degrees) == 1,
consistent_longitude = n_distinct(Longitude_Decimal_Degrees) == 1
)
location_coords_consistency
## # A tibble: 26 × 3
## Study_Specific_Location_ID consistent_latitude consistent_longitude
## <fct> <lgl> <lgl>
## 1 BLA001 TRUE TRUE
## 2 BLA002 TRUE TRUE
## 3 BLA00201 TRUE TRUE
## 4 BLA00202 TRUE TRUE
## 5 BLA00203 TRUE TRUE
## 6 BLA00204 TRUE TRUE
## 7 BLA003 TRUE TRUE
## 8 BLA00301 TRUE TRUE
## 9 BLA004 TRUE TRUE
## 10 BLA005 TRUE TRUE
## # ℹ 16 more rows
Calculate the number of QA samples taken. The result should be 0.10 or greater (at least 10% of the total).
replicate_flag_counts <- df %>%
summarise(
total_count = n(),
replicate_count = sum(Sample_Replicate_Flag == "Y"),
replicate_proportion = replicate_count / total_count
)
replicate_flag_counts
## total_count replicate_count replicate_proportion
## 1 281 27 0.09608541
Provide recommendation to the team to increase QA sample frequency.
Check that sample values are within expected range. If no records are returned, no values are out of expected range. Any values that are returned, please review and accept/reject accordingly.
##Check expected values
#E.coli is expected to be between 1 and 2419.6
#TP is expected to be between 0.005 and 1000
# Filter records for Total Phosphorus and E. coli that don't meet the criteria
invalid_records <- df %>%
filter(
(Result_Parameter_Name == "Total Phosphorus" & (Result_Value < 0.005 | Result_Value > 1000)) |
(Result_Parameter_Name == "E. coli" & (Result_Value < 1 | Result_Value > 2420))
)
# Display the records that fall out of bounds
invalid_records
## [1] Study_ID Location_ID
## [3] Study_Specific_Location_ID Field_Collection_Type
## [5] Field_Collector Field_Collection_Start_Date
## [7] Field_Collection_Comment Latitude_Decimal_Degrees
## [9] Longitude_Decimal_Degrees Sample_ID
## [11] Sample_Field_Replicate_ID Sample_Replicate_Flag
## [13] Sample_Composite_Flag Sample_Matrix
## [15] Sample_Source Sample_Collection_Method
## [17] Sample_Preparation_Method Result_Parameter_Name
## [19] Result_Parameter_CAS_Number Lab_Analysis_Date
## [21] Result_Value Result_Value_Units
## [23] Result_Reporting_Limit Result_Reporting_Limit_Type
## [25] Result_Detection_Limit Result_Detection_Limit_Type
## [27] Result_Data_Qualifier Fraction_Analyzed
## [29] Result_Method Result_Lab_Name
## <0 rows> (or 0-length row.names)
Relative Percent Difference
Check the Relative Percent Difference (RPD) for E. coli and phosphorus samples.Samples outside defined thresholds may be rejected.
\[ RPD = \frac{\lvert{R1-R2}\rvert}{\frac{R1+R2}{2}}\times100 \] Total Phosphorus: TP QA sample values must be within 20% of the sample. OR if TP<0.025, sample is accepted even if they fall outside the 20% range.
# Filter dataframe by result_parameter_name = 'Total Phosphorus'
total_phosphorus_data <- df %>%
filter(Result_Parameter_Name == "Total Phosphorus")
# Filter for Sample_Replicate_Flag == "Y"
replicate_Y_data <- total_phosphorus_data %>%
filter(Sample_Replicate_Flag == "Y")
# Pair rows for Sample_Replicate_Flag == "Y" with Sample_Replicate_Flag == "N"
paired_data <- replicate_Y_data %>%
left_join(total_phosphorus_data %>%
filter(Sample_Replicate_Flag == "N"),
by = c("Study_Specific_Location_ID", "Field_Collection_Start_Date"),
suffix = c("_Y", "_N"))
# Calculate the relative percent difference between the two paired values
paired_data <- paired_data %>%
mutate(relative_percent_difference = abs(Result_Value_Y - Result_Value_N) / ((Result_Value_Y + Result_Value_N) / 2) * 100)
# Return a table showing both paired values, Study_Specific_Location_ID, Field_Collection_Start_Date, Result_Value, and calculated relative percent difference values
TP_result_table <- paired_data %>%
select(Study_Specific_Location_ID, Field_Collection_Start_Date, Result_Value_Y, Result_Value_N, relative_percent_difference)
# Print the table
print(TP_result_table)
## Study_Specific_Location_ID Field_Collection_Start_Date Result_Value_Y
## 1 BLA003 7/12/2023 0.041
## 2 BLA004 7/20/2023 0.041
## 3 BLA005 7/25/2023 0.039
## 4 BLA007 9/14/2023 0.039
## 5 BLA008 10/12/2023 0.015
## 6 BLA009 10/12/2023 0.024
## 7 BLA001 12/14/2023 0.028
## 8 BLA009 12/15/2023 0.015
## Result_Value_N relative_percent_difference
## 1 0.041 0.000000
## 2 0.041 0.000000
## 3 0.050 24.719101
## 4 0.041 5.000000
## 5 0.015 0.000000
## 6 0.023 4.255319
## 7 0.030 6.896552
## 8 NA NA
# Assign a group to each row based on the average of the paired Result_Value
TP_result_table <- TP_result_table %>%
mutate(
group = ifelse((Result_Value_Y + Result_Value_N) / 2 < 0.025, "accepted", "evaluate"),
highlight = ifelse(relative_percent_difference > 20, "highlight", "")
)
# Print the table with group and highlight columns
print(TP_result_table)
## Study_Specific_Location_ID Field_Collection_Start_Date Result_Value_Y
## 1 BLA003 7/12/2023 0.041
## 2 BLA004 7/20/2023 0.041
## 3 BLA005 7/25/2023 0.039
## 4 BLA007 9/14/2023 0.039
## 5 BLA008 10/12/2023 0.015
## 6 BLA009 10/12/2023 0.024
## 7 BLA001 12/14/2023 0.028
## 8 BLA009 12/15/2023 0.015
## Result_Value_N relative_percent_difference group highlight
## 1 0.041 0.000000 evaluate
## 2 0.041 0.000000 evaluate
## 3 0.050 24.719101 evaluate highlight
## 4 0.041 5.000000 evaluate
## 5 0.015 0.000000 accepted
## 6 0.023 4.255319 accepted
## 7 0.030 6.896552 evaluate
## 8 NA NA <NA> <NA>
#Values highlighted in table are REJ
RPD for samples needs to be evaluated in context. Results returned in the table as “accepted” will be accepted regardless of RPD because the sample concentrations are close to the detection limit.
Those labeled “evaluate” should be assessed based on both the RPD value and what is known about the site heterogeneity.
BLA005, collected on 7/25/2023, does not meet quality standards and will be removed from the analysis.
E. coli: Bacteria samples with low counts tend to have higher variability. Therefore, EC sample pairs (sample and field duplicate) will be separated into two groups:
• “low count samples” where the pair mean ≤ 20 MPN/100 mL and • “higher count samples” where the pair mean > 20 MPN/100 mL.
For precision of bacteria field replicates: • 50% of the replicate pairs must be at or below 20% RPD • 90% of the pairs must be at or below 50% RPD
# Filter for E. coli
ecoli_data <- subset(df, Result_Parameter_Name == 'E.coli')
# Filter for Sample_Replicate_Flag == "Y"
replicate_Y_data <- ecoli_data %>%
filter(Sample_Replicate_Flag == "Y")
# Pair rows for Sample_Replicate_Flag == "Y" with Sample_Replicate_Flag == "N"
paired_df <- replicate_Y_data %>%
left_join(ecoli_data %>%
filter(Sample_Replicate_Flag == "N"),
by = c("Study_Specific_Location_ID", "Field_Collection_Start_Date"),
suffix = c("_Y", "_N"))
# Calculate the relative percent difference between the two paired values
paired_df <- paired_df %>%
mutate(relative_percent_difference = abs(Result_Value_Y - Result_Value_N) / ((Result_Value_Y + Result_Value_N) / 2) * 100)
EC_result_table <- paired_df %>%
select(Study_Specific_Location_ID, Field_Collection_Start_Date, Result_Value_Y, Result_Value_N, relative_percent_difference)
# Return the table
EC_result_table
## Study_Specific_Location_ID Field_Collection_Start_Date Result_Value_Y
## 1 BLA003 7/12/2023 410
## 2 BLA004 7/20/2023 365
## 3 BLA005 7/25/2023 63
## 4 BLA00203 9/14/2023 186
## 5 BLA007 9/14/2023 816
## 6 BLA008 10/12/2023 4
## 7 BLA009 10/12/2023 127
## 8 BLA00502 11/27/2023 3
## 9 BLA001 12/14/2023 15
## 10 BLA009 12/14/2023 9
## 11 BLA00204 1/9/2024 29
## 12 BLA002 1/11/2024 49
## 13 BLA004 1/11/2024 8
## 14 BLA00301 1/16/2024 93
## 15 BLA002 1/22/2024 51
## 16 BLA00705 1/22/2024 44
## Result_Value_N relative_percent_difference
## 1 652 45.574388
## 2 236 42.928453
## 3 81 25.000000
## 4 214 14.000000
## 5 980 18.262806
## 6 6 40.000000
## 7 27 129.870130
## 8 3 0.000000
## 9 10 40.000000
## 10 9 0.000000
## 11 52 56.790123
## 12 45 8.510638
## 13 10 22.222222
## 14 111 17.647059
## 15 47 8.163265
## 16 56 24.000000
# Assign a group based on the average of the Result_Value_Replicate_Y and Result_Value_Replicate_N
EC_result_table$group <- ifelse((EC_result_table$Result_Value_Y + EC_result_table$Result_Value_N) / 2 <= 20,
"low count samples", "higher count samples")
# Calculate the percent of values with Relative_Percent_Difference less than or equal to 20 by group
percent_diff_20 <- with(EC_result_table, tapply(relative_percent_difference <= 20, group, mean) * 100)
# Calculate the percent of values with Relative_Percent_Difference less than or equal to 50 by group
percent_diff_50 <- with(EC_result_table, tapply(relative_percent_difference <= 50, group, mean) * 100)
# Combine the results into a data frame
percent_diff_results <- data.frame(
Group = c("low count samples", "higher count samples"),
Percent_Diff_LE_20 = c(percent_diff_20["low count samples"], percent_diff_20["higher count samples"]),
Percent_Diff_LE_50 = c(percent_diff_50["low count samples"], percent_diff_50["higher count samples"])
)
# Return the results
percent_diff_results
## Group Percent_Diff_LE_20 Percent_Diff_LE_50
## low count samples low count samples 40.00000 100.00000
## higher count samples higher count samples 45.45455 81.81818
# • 50% of the replicate pairs must be at or below 20% RPD
# • 90% of the pairs must be at or below 50% RPD
The replicate sample for BLA009 on 10/12/2023 stands out as an anomaly with an RPD of 129%. This sample will be removed from the analysis as the sample is very likely not representative of the system.
50% of replicate pairs must be at or below 20% RPD. Currently, 40% of low count samples area and 37.5% of high count samples are.
90% of replicate pairs must be at or below 50% RPD. Currently, 100% of low count samples are and 75% of high count samples are.
Low count samples with non-anomalous larger RPD values will be included in analyses for the time being because the data set is currently incomplete and when more replicate samples have been collected, the percent of samples meeting the thresholds may meet quality standards. A final determination will be made when all data have been collected and the project concluded in 2025.
Analysis should be re-run with the corrected dataframe to see how quality assessment improves.
Now that the QA Assessment has run, data is prepped for analysis. Remove all QA records and any other rejected records:
# Create a new dataframe with the specified conditions
new_df <- df %>%
filter(Sample_Replicate_Flag != "Y") %>%
filter(!(Study_Specific_Location_ID == "BLA005" & Field_Collection_Start_Date == "7/25/2023")) %>%
filter(!(Study_Specific_Location_ID == "BLA009" & Field_Collection_Start_Date == "10/12/2023"))
Define the date field so it may be recognized as a linear measure of time:
##Convert date class
new_df$Field_Collection_Start_Date
new_df <- new_df %>%
mutate(Field_Collection_Start_Date = mdy(Field_Collection_Start_Date))
head(new_df)
new_df$Field_Collection_Start_Date
Add a new field, “Season”, defined by the Field_Collection_Start_Date:
new_df$Month<-format(as.Date(new_df$Field_Collection_Start_Date), "%m")
new_df$Month
colnames(new_df)
new_df <- new_df %>%
mutate(Season = case_when(
Month > 04 | Month < 10 ~ "Dry",
TRUE ~ "Wet"))
new_df$Season
new_df
Save the prepped dataset as an object
save(new_df, file="J:/Git_WS/R-Code-Sample/Inputs/new_df.rda")
write.csv(new_df, file="O:/EH_Health/Surface Water/+ PIC/Projects/Black Lake Grant 2022-25/Data/Black_Lake_Data_Prepped.csv", row.names=FALSE)
Look at numeric distributions:
# Plot numeric distributions of Result_Value by Result_Parameter_Name
ggplot(new_df, aes(x = Result_Value)) +
geom_histogram(bins = 30) +
facet_wrap(~Result_Parameter_Name, scales = "free_x") +
theme_minimal() +
xlab("Result Value") +
ylab("Frequency") +
ggtitle("Distributions of Result Value by Parameter Name")
##check normality
# Check the normality of Result_Value by Result_Parameter_Name using QQ plots
ggplot(new_df, aes(sample = Result_Value)) +
geom_qq() +
geom_qq_line() +
facet_wrap(~Result_Parameter_Name) +
theme_minimal() +
xlab("Theoretical Quantiles") +
ylab("Sample Quantiles") +
ggtitle("QQ Plots of Result Value by Parameter Name")
Create a boxplot using Plotly for E. coli. The plot is interactive.
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning: 'layout' objects don't have these attributes: 'boxmode'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'
Create a boxplot using Plotly for Total Phosphorus
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning: 'layout' objects don't have these attributes: 'boxmode'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'
Create plots for each E.coli and Total Phosphorus results over time. The plots are interactive and you can click on the series to the right to hide or show them. Static pngs are also available for download.
Calculate some preliminary statistics.
Return the number of samples taken by site:
#how many samples were taken?
nrow(new_df)
## [1] 250
#250
# Make table summary
summary <- df %>%
summarise(
`Total number E. coli Samples Taken` = sum(Result_Parameter_Name == "E.coli"),
`Total number of Total Phosphorus Samples Taken` = sum(Result_Parameter_Name == "Total Phosphorus"),
`Number E. coli QA Samples` = sum(Result_Parameter_Name == "E.coli" & Sample_Replicate_Flag == "Y"),
`Number TP QA Samples` = sum(Result_Parameter_Name == "Total Phosphorus" & Sample_Replicate_Flag == "Y")
)
summary
## Total number E. coli Samples Taken
## 1 173
## Total number of Total Phosphorus Samples Taken Number E. coli QA Samples
## 1 88 16
## Number TP QA Samples
## 1 8
Calculate the geomean by routine site and season for E. coli:
#Remove all Segmentation samples and filter for E. coli only
df_filtered <- new_df %>%
filter(Field_Collection_Comment != "Segmentation", Result_Parameter_Name == "E.coli")
# Filter for "Wet" and "Dry" seasons only
df_season_filtered <- df_filtered %>%
filter(Season %in% c("Wet", "Dry"))
# Define a function to calculate the geometric mean
geomean <- function(x) exp(mean(log(x), na.rm = TRUE))
# Aggregate by season and study_specific_location_id and calculate the geomean and counts
aggregated_df <- df_season_filtered %>%
group_by(Study_Specific_Location_ID, Season) %>%
summarise(
Geomean = geomean(Result_Value),
`Total Number of Samples` = n(),
`% of Samples above 100` = sum(Result_Value >= 100) / n() * 100,
`% of Samples above 320` = sum(Result_Value >= 320) / n() * 100,
.groups = 'drop'
)
aggregated_df
## # A tibble: 18 × 6
## Study_Specific_Location_ID Season Geomean `Total Number of Samples`
## <fct> <chr> <dbl> <int>
## 1 BLA001 Dry 216. 9
## 2 BLA001 Wet 40.1 3
## 3 BLA002 Dry 155. 6
## 4 BLA002 Wet 35.1 3
## 5 BLA003 Dry 559. 8
## 6 BLA003 Wet 435. 3
## 7 BLA004 Dry 148. 9
## 8 BLA004 Wet 23.9 3
## 9 BLA005 Dry 102. 8
## 10 BLA005 Wet 10.5 3
## 11 BLA006 Dry 309. 4
## 12 BLA006 Wet 257. 2
## 13 BLA007 Dry 648. 9
## 14 BLA007 Wet 324. 3
## 15 BLA008 Dry 4.93 3
## 16 BLA008 Wet 3.91 3
## 17 BLA009 Dry 53.9 3
## 18 BLA009 Wet 11.6 2
## # ℹ 2 more variables: `% of Samples above 100` <dbl>,
## # `% of Samples above 320` <dbl>
Calculate the geomean by routine site and season for Total Phosphorus:
df_tp <- new_df %>%
filter(Field_Collection_Comment != "Segmentation", Result_Parameter_Name == "Total Phosphorus")
# Filter for "Wet" and "Dry" seasons only
df_season_tp <- df_tp %>%
filter(Season %in% c("Wet", "Dry"))
# Define a function to calculate the geometric mean
geomean <- function(x) exp(mean(log(x), na.rm = TRUE))
# Aggregate by season and study_specific_location_id and calculate the geomean and counts
aggregated_df_tp <- df_season_tp %>%
group_by(Study_Specific_Location_ID, Season) %>%
summarise(
Geomean = geomean(Result_Value),
`Total Number of Samples` = n(),
`% of Samples above 0.01mg/L` = sum(Result_Value >= 0.01) / n() * 100,
.groups = 'drop'
)
aggregated_df_tp
## # A tibble: 18 × 5
## Study_Specific_Location_ID Season Geomean `Total Number of Samples`
## <fct> <chr> <dbl> <int>
## 1 BLA001 Dry 0.0800 8
## 2 BLA001 Wet 0.0356 3
## 3 BLA002 Dry 0.0677 6
## 4 BLA002 Wet 0.0440 3
## 5 BLA003 Dry 0.0548 8
## 6 BLA003 Wet 0.042 3
## 7 BLA004 Dry 0.0365 8
## 8 BLA004 Wet 0.0209 3
## 9 BLA005 Dry 0.0330 7
## 10 BLA005 Wet 0.0187 3
## 11 BLA006 Dry 0.0962 4
## 12 BLA006 Wet 0.0245 2
## 13 BLA007 Dry 0.0335 8
## 14 BLA007 Wet 0.0260 3
## 15 BLA008 Dry 0.0124 2
## 16 BLA008 Wet 0.0196 3
## 17 BLA009 Dry 0.0310 2
## 18 BLA009 Wet 0.0163 2
## # ℹ 1 more variable: `% of Samples above 0.01mg/L` <dbl>
Calculate E. coli geomean for segmented sites by season:
#Repeat for segmented sites
df_seg <- new_df %>%
filter(Field_Collection_Comment == "Segmentation", Result_Parameter_Name == "E.coli")
# Filter for "Wet" and "Dry" seasons only
df_seg_filtered <- df_seg %>%
filter(Season %in% c("Wet", "Dry"))
# Define a function to calculate the geometric mean
geomean <- function(x) exp(mean(log(x), na.rm = TRUE))
# Aggregate by season and study_specific_location_id and calculate the geomean and counts
aggregated_df_seg <- df_seg_filtered %>%
group_by(Study_Specific_Location_ID, Season) %>%
summarise(
Geomean = geomean(Result_Value),
`Total Number of Samples` = n(),
`% of Samples above 100` = sum(Result_Value >= 100) / n() * 100,
`% of Samples above 320` = sum(Result_Value >= 320) / n() * 100,
.groups = 'drop'
)
aggregated_df_seg
## # A tibble: 21 × 6
## Study_Specific_Location_ID Season Geomean `Total Number of Samples`
## <fct> <chr> <dbl> <int>
## 1 BLA002 Dry 113. 7
## 2 BLA00201 Dry 104. 7
## 3 BLA00202 Dry 83.7 7
## 4 BLA00203 Dry 144. 7
## 5 BLA00204 Dry 89.5 7
## 6 BLA003 Dry 530. 7
## 7 BLA00301 Dry 714. 7
## 8 BLA004 Wet 6.32 3
## 9 BLA005 Wet 12.8 3
## 10 BLA00501 Wet 11.0 3
## # ℹ 11 more rows
## # ℹ 2 more variables: `% of Samples above 100` <dbl>,
## # `% of Samples above 320` <dbl>